Visual Assignment

A short description of the post.

Liu Manye true
07-13-2021

Mini Challenge 2 Background

In the roughly twenty years that Tethys-based GAStech has been operating a natural gas production site in the island country of Kronos, it has produced remarkable profits and developed strong relationships with the government of Kronos. However, GAStech has not been as successful in demonstrating environmental stewardship.

In January, 2014, the leaders of GAStech are celebrating their new-found fortune as a result of the initial public offering of their very successful company. In the midst of this celebration, several employees of GAStech go missing. An organization known as the Protectors of Kronos (POK) is suspected in the disappearance, but things may not be what they seem.

As an expert in visual analytics, you are called in to help law enforcement from Kronos and Tethys.

In Mini-Challenge 2, you are asked to analyze movement and tracking data. GAStech provides many of their employees with company cars for their personal and professional use, but unbeknownst to the employees, the cars are equipped with GPS tracking devices. You are given tracking data for the two weeks leading up to the disappearance, as well as credit card transactions and loyalty card usage data. From this data, I will try to solve the given questions, identify anomalies and suspicious behaviors, and identify which people use which credit and loyalty cards.

Question 1

Using just the credit and loyalty card data, identify the most popular locations, and when they are popular. What anomalies do you see? What corrections would you recommend to correct these anomalies? Please limit your answer to 8 images and 300 words.

R setup

R is the only tools will be used in this assignment.

Run the code below to complete R option setup:

Install the needed package for this assignment. This code chunk checks if required packages are installed. If they are not installed, the next line of code will install them. The following line is then use to import the library into the current working environment.

packages = c('igraph', 'tidygraph', 'ggraph', 'visNetwork', 'lubridate', 'clock', 'tidyverse', 'dplyr','raster', 'sf', 'tmap', 'gifski', 'mapview')

for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p, character.only = T)
}

Import Dataset

cc_data.csv and loyalty_data.csv are the two datasets required to answer Question 1. I used read_csv to import them in the code chunk below:

cc <- read_csv("MC2/cc_data.csv")
loyalty <- read_csv("MC2/loyalty_data.csv")
glimpse(cc)
Rows: 1,490
Columns: 4
$ timestamp  <chr> "1/6/2014 7:28", "1/6/2014 7:34", "1/6/2014 7:35"~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
glimpse(loyalty)
Rows: 1,392
Columns: 4
$ timestamp  <chr> "1/6/2014", "1/6/2014", "1/6/2014", "1/6/2014", "~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

Data Preparation

According to the glimpse of the cc data and loyalty data, we can see that timestamps in both file are in a character formate. Run the following code to change datatype of timestamps into a correct datetime format:

cc$timestamp <- date_time_parse(cc$timestamp,
                zone = "",
                format = "%m/%d/%Y %H:%M")
glimpse(cc)
Rows: 1,490
Columns: 4
$ timestamp  <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~

There are only date information in the loyalty dataset, hence its format setting is “%m/%d/%Y” instead of %m/%d/%Y %H:%M

loyalty$timestamp <- date_time_parse(loyalty$timestamp,
                zone = "",
                format = "%m/%d/%Y")
glimpse(loyalty)
Rows: 1,392
Columns: 4
$ timestamp  <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~
$ location   <chr> "Brew've Been Served", "Brew've Been Served", "Ha~
$ price      <dbl> 4.17, 9.60, 16.53, 11.51, 12.93, 4.27, 11.20, 15.~
$ loyaltynum <chr> "L2247", "L9406", "L8328", "L6417", "L1107", "L40~

Both table contains date information of GAStech staff’s spending behaviour. In order to combine these two seperated table into one, we need to seperate date information in the cc data and create a link between them. Code chunk below will help us to do that.

cc$Date <- format(cc$timestamp, format="%Y-%m-%d")
cc$Date <- date_time_parse(cc$Date,
                           zone = "",
                           format = "%Y-%m-%d")
glimpse(cc)
Rows: 1,490
Columns: 5
$ timestamp  <dttm> 2014-01-06 07:28:00, 2014-01-06 07:34:00, 2014-0~
$ location   <chr> "Brew've Been Served", "Hallowed Grounds", "Brew'~
$ price      <dbl> 11.34, 52.22, 8.33, 16.72, 4.24, 4.17, 28.73, 9.6~
$ last4ccnum <dbl> 4795, 7108, 6816, 9617, 7384, 5368, 7253, 4948, 9~
$ Date       <dttm> 2014-01-06, 2014-01-06, 2014-01-06, 2014-01-06, ~

After creating the sharing column both table, a full_join verb is used to join cc data and loyalty data together. In the new table, both cases in cc and loyalty will be kept no matter whether an exact match existed or not.

card_joined <- cc %>%
  full_join(loyalty, by = c("Date" = "timestamp", "location", "price"))

In card_joined, we can get a clearer image of each credit card holders’ spending patterns. Including the timestamp, location, price spent, credit card number, and loyal card number. In this way, we can see all the existing credit_card and loyalty_card combinations and explore if there is any cross usage of staff’s loyalty card in the past half month.

filter popular locations for credit card transactions: Code chunk below help use count the number of occurrences of each location and sort them in a descending order.

popularLoc_cc <- cc %>%
  group_by(location) %>%
  summarize(count = n()) %>%
  arrange(desc(count))

Similar as the previous code chunk, this code chunk will help use to find the popular locations based staffs’ loyalty card usage frequency:

popularLoc_loyalty <- loyalty %>%
  group_by(location) %>%
  summarize(count = n()) %>%
  arrange(desc(count))

PopularLc_cc file contains one more observations than popularLoc_loyalty, which is Daily Dealz. Daily Dealz only appear once, hence it would not impact on our exploration of Q1 answers.

popularLoc_cc %>%
  anti_join(popularLoc_loyalty, by = c("location"))
# A tibble: 1 x 2
  location    count
  <chr>       <int>
1 Daily Dealz     1

Charting and Analysis

To illustrate the popular locations that GasTech staffs shop frequently, two bar plots will be drawn below.

Running the code chunk below will create a table that contains the top 6 popular location based on credit card usage information so that we can use it to draw a graph:

popular_top6_credit <- popularLoc_cc %>%
 
 gather(location, count) %>%
 arrange(desc(count)) %>%
 top_n(6)

popular_top6_credit
# A tibble: 6 x 2
  location            count
  <chr>               <int>
1 Katerina's Cafe       212
2 Hippokampos           171
3 Guy's Gyros           158
4 Brew've Been Served   156
5 Hallowed Grounds       92
6 Ouzeri Elian           87

Run the code chunk below to draw the bar chart:

top6Loc_credit<-ggplot(data=popular_top6_credit, aes(x=location, y=count)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=count), position=position_dodge(width=0.9), vjust=-0.25) +
  theme_minimal()

print(top6Loc_credit + ggtitle("Top 6 Popular Locations Based on Credit Card U`sage"))

Running the code chunk below to create a table that contains the top 6 popular location based on loyalty card use frequency so that we can use it to draw a graph:

popular_top6_loyalty <- popularLoc_loyalty %>%
 
 gather(location, count) %>%
 arrange(desc(count)) %>%
 top_n(6)

popular_top6_loyalty
# A tibble: 6 x 2
  location            count
  <chr>               <int>
1 Katerina's Cafe       195
2 Hippokampos           155
3 Guy's Gyros           146
4 Brew've Been Served   140
5 Ouzeri Elian           84
6 Hallowed Grounds       80

Run the code chunk below to draw the bar chart for popular_top6_loyalty table:

top6Loc_loyalty<-ggplot(data=popular_top6_loyalty, aes(x=location, y=count)) +
  geom_bar(stat="identity", fill="steelblue")+
  geom_text(aes(label=count), position=position_dodge(width=0.9), vjust=-0.25) +
  theme_minimal()

print(top6Loc_loyalty + ggtitle("Top 6 Popular Locations Based on Loyalty Card Usage"))

Two popularLoc data shares same top six popular locations, they are: “Katerina’s Cafe”, “Hippokampos”, “Guy’s Gyros”, “Brew’ve Been Served”, “Ouzeri Elian”, “Hallowed Grounds”. There is a small different in the ranking of these location in two tables but it is insignificant for our exploration.

Create a table contains popular location info only:

popular_locations <- card_joined %>%
  filter(location %in% c("Katerina's Cafe", "Hippokampos", "Guy's Gyros", "Brew've Been Served", "Ouzeri Elian", "Hallowed Grounds")) %>%
  drop_na(timestamp) %>%
  dplyr::select(-Date)

According to this table, we can see that Katrina’s Cafe, Hippokampos, Guy’s Gyros, and Ouzeri Elian are places that sells food, hence these location are popular during the lunch break (12-14 pm.) and dinner time (19-21 pm.). On the other hand, Brew’ve Been Served and Hallowed Grounds mainly sell coffee. They are popular between 7:30 - 8:30 am. only since GasTech staffs only visit there to grab a coffee before working.

Anomalies

To promote local businesses, Kronos based companies provide a Kronos Kares benefit card to GASTech employees giving them discounts and rewards in exchange for collecting information about their credit card purchases and preferences as recorded on loyalty cards. Since loyalty card is discount card that offered by GasTech is meant to promote business and to collect staffs’ spending information, it is abnormal to see cross usage of loyalty card between GasTech staffs. Anomolies like this implies that there is a possible hidden relationship between the owners of loyalty cards that were cross used.

Run the code chunk below to find out credit cards that involves in the anomalies discussed above:

abnormal_cc <- popular_locations %>%
  drop_na(loyaltynum) %>%
  group_by(last4ccnum) %>%
  summarize(loy_n = n_distinct(loyaltynum)) %>%
  filter(loy_n > 1)

abnormal_cc
# A tibble: 7 x 2
  last4ccnum loy_n
       <dbl> <int>
1       1286     2
2       4795     2
3       4948     2
4       5368     2
5       5921     2
6       7889     2
7       8332     2

Question 2

Add the vehicle data to your analysis of the credit and loyalty card data. How does your assessment of the anomalies in question 1 change based on this new data? What discrepancies between vehicle, credit, and loyalty card data do you find? Please limit your answer to 8 images and 500 words.

Importing

Importing raster file MC2-tourist.tif into R by using raster() or Raster package.

bgmap <- raster("MC2/MC2-tourist.tif")
bgmap
class      : RasterLayer 
band       : 1  (of  3  bands)
dimensions : 1595, 2706, 4316070  (nrow, ncol, ncell)
resolution : 3.16216e-05, 3.16216e-05  (x, y)
extent     : 24.82419, 24.90976, 36.04499, 36.09543  (xmin, xmax, ymin, ymax)
crs        : +proj=longlat +datum=WGS84 +no_defs 
source     : MC2-tourist.tif 
names      : MC2.tourist 
values     : 0, 255  (min, max)

In the code chunk below, we will use tm_raster() to plot a raster layer by using tmap package.

tmap_mode("plot")
tm_shape(bgmap) +
  tm_raster(bgmap,
            legend.show = FALSE)

tm_shape(bgmap) +
tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255)

Importing Vector GIS Data File: Abila GIS data layer is in ESRI shapefile format. Code-chunk below will use st_read() of the sf package to import Abila shapefile into R.

Abila_st <- st_read(dsn = "MC2/Geospatial",
                    layer = "Abila")
Reading layer `Abila' from data source 
  `C:\LiuManye-dotcom\DataViz_blog\_posts\2021-07-13-visual-assignment\MC2\Geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 3290 features and 9 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 24.82401 ymin: 36.04502 xmax: 24.90997 ymax: 36.09492
Geodetic CRS:  WGS 84

Using read_csv() of readr package to import gps2.csv into R.

gps2 <- read_csv("MC2/gps2.csv")
glimpse(gps2)
Rows: 685,169
Columns: 6
$ Timestamp         <chr> "1/6/2014 7:20", "1/6/2014 7:20", "1/6/201~
$ id                <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
$ lat               <dbl> 36.06646, 36.06634, 36.06615, 36.06613, 36~
$ long              <dbl> 24.88258, 24.88259, 24.88258, 24.88258, 24~
$ `Time Difference` <time>       NA, 00:02:00, 00:03:00, 00:01:00, 0~
$ Seconds           <dbl> 0, 2, 3, 1, 3, 1, 1, 1, 4, 1, 1, 2, 3, 1, ~

New factor Time Difference was previously added into the gps file by using the Timestampe of the next gps record of a particular car id minus the previous Timestampe. After calculating the time differences between each gps records of a car id, present it in a second unit to get Seconds. In this way, we can get the time interval between a person’s movement and using it to predict its spending behaviour in the future exploration.

Timestamp field is not in the right date-time format. Running the code chunk below to change data type:

gps2$Timestamp <- date_time_parse(gps2$Timestamp,
                zone = "",
                format = "%m/%d/%Y %H:%M")
gps2$id <- as_factor(gps2$id)

glimpse(gps2)
Rows: 685,169
Columns: 6
$ Timestamp         <dttm> 2014-01-06 07:20:00, 2014-01-06 07:20:00,~
$ id                <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
$ lat               <dbl> 36.06646, 36.06634, 36.06615, 36.06613, 36~
$ long              <dbl> 24.88258, 24.88259, 24.88258, 24.88258, 24~
$ `Time Difference` <time>       NA, 00:02:00, 00:03:00, 00:01:00, 0~
$ Seconds           <dbl> 0, 2, 3, 1, 3, 1, 1, 1, 4, 1, 1, 2, 3, 1, ~

Code chunk below convers gps data frome into a simple feature data frame by using st_as_sf() of sf packages

gps_sf <- st_as_sf(gps2, 
                   coords = c("long", "lat"),
                       crs= 4326)

gps_sf
Simple feature collection with 685169 features and 4 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 24.82509 ymin: 36.04802 xmax: 24.90849 ymax: 36.08996
Geodetic CRS:  WGS 84
# A tibble: 685,169 x 5
   Timestamp           id    `Time Difference` Seconds
 * <dttm>              <fct> <time>              <dbl>
 1 2014-01-06 07:20:00 1        NA                   0
 2 2014-01-06 07:20:00 1     02'00"                  2
 3 2014-01-06 07:20:00 1     03'00"                  3
 4 2014-01-06 07:20:00 1     01'00"                  1
 5 2014-01-06 07:20:00 1     03'00"                  3
 6 2014-01-06 07:20:00 1     01'00"                  1
 7 2014-01-06 07:20:00 1     01'00"                  1
 8 2014-01-06 07:20:00 1     01'00"                  1
 9 2014-01-06 07:20:00 1     04'00"                  4
10 2014-01-06 07:20:00 1     01'00"                  1
# ... with 685,159 more rows, and 1 more variable:
#   geometry <POINT [°]>

Run the chunk below to split Timestamp in gps_sf file into day, hour and minute factors:

gps_sf$day <- format(gps_sf$Timestamp, format="%d")
gps_sf$hour <- format(gps_sf$Timestamp, format="%H")
gps_sf$minute <- format(gps_sf$Timestamp, format="%M")

To avoid meaningless road traffic event or road-driving path, the code below creates a table to find out path records that has time interval > 3mins (180 seconds). In this way we geometry information at the beginning of each travel:

more_than_3mins <- gps_sf %>%
  filter(Seconds > 180)

Creating Movement Path from GPS Points

Code chunk below joins the gps points into movement paths by using the drivers’ IDs as unique identifiers.

gps_path <- gps_sf %>%
  group_by(id, day, hour) %>%
  summarize(m = mean(Timestamp), 
            do_union=FALSE) %>%
  st_cast("LINESTRING")

Combine gps_path and p into one and eliminate data that cannot be used to draw a line graph:

p = npts(gps_path, by_feature = TRUE)
gps_path2 <- cbind(gps_path, p) %>%
  filter(p>1)

Plotting line graphs

gps_path_selected <- gps_path2 %>%
  filter(id==29, hour == "20")
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines()

Filtering out observation that seem to present a meaningful start of a trip. Which means it is very likely to see these observations follow a credit card payment.

gps_dot <- more_than_3mins %>%
  group_by(id, hour, day, minute) %>%
  summarize(geo_n = n_distinct(geometry)) %>%
  st_cast("POINT")

Drawing dot graph

Using findings from Q1 to explore

Abnormal credit card was found at the end of Q1. Using the line plot and dot plot above to explore the date and time when the abnormal spending behaviour happens to see if there are any abnormal gps movements can be found.

The investigation result is presented below:

Abnormal_cases <- read_csv("MC2/abnormal_cc.csv")

Abnormal_cases
# A tibble: 9 x 6
  CC_number Loyalty_number Car_ID Name            CurrentEmploymentTy~
      <dbl> <chr>          <chr>  <chr>           <chr>               
1      1286 L3572          22     Adra Nubarron   Security            
2      1286 L3288          22     Adra Nubarron   Security            
3      4795 L8566          34     Edvard Vann     Security            
4      4948 L3295          18     Birgitta Frente Engineering         
5      5921 L9406          29     Bertrand Ovan   Facilities          
6      5921 L3295          29     Bertrand Ovan   Facilities          
7      7889 L6119          8      Lucas Alcazar   Information Technol~
8      7889 L2247          8/22/6 -               -                   
9      8332 L2070          10     Ada Campo-Corr~ Executive           
# ... with 1 more variable: CurrentEmploymentTitle <chr>

Question 3

Can you infer the owners of each credit card and loyalty card? What is your evidence? Where are there uncertainties in your method? Where are there uncertainties in the data? Please limit your answer to 8 images and 500 words.

Combine data into one

import car0assignments data into R:

car <- read_csv("MC2/car-assignments.csv")
glimpse(car)
Rows: 44
Columns: 5
$ LastName               <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName              <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID                  <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType  <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~

Change datatype:

car <- car %>%
  drop_na(CarID)

car$CarID <- as_factor(car$CarID)

glimpse(car)
Rows: 35
Columns: 5
$ LastName               <chr> "Calixto", "Azada", "Balas", "Barranc~
$ FirstName              <chr> "Nils", "Lars", "Felix", "Ingrid", "I~
$ CarID                  <fct> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12~
$ CurrentEmploymentType  <chr> "Information Technology", "Engineerin~
$ CurrentEmploymentTitle <chr> "IT Helpdesk", "Engineer", "Engineer"~
glimpse(gps2)
Rows: 685,169
Columns: 6
$ Timestamp         <dttm> 2014-01-06 07:20:00, 2014-01-06 07:20:00,~
$ id                <fct> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
$ lat               <dbl> 36.06646, 36.06634, 36.06615, 36.06613, 36~
$ long              <dbl> 24.88258, 24.88259, 24.88258, 24.88258, 24~
$ `Time Difference` <time>       NA, 00:02:00, 00:03:00, 00:01:00, 0~
$ Seconds           <dbl> 0, 2, 3, 1, 3, 1, 1, 1, 4, 1, 1, 2, 3, 1, ~

Join car-assignment data and gps data together:

car_gps <- car %>%
  full_join(gps2, by = c("CarID" = "id"))

Edit columns: Combining long and lat column into one and get geometry data. Combining FistName and LastName together and get the full name of each staffs.

car_gps <- st_as_sf(car_gps, 
                   coords = c("long", "lat"),
                       crs= 4326)

car_gps <- car_gps %>%
  unite("Name", FirstName, LastName, sep = " ")

Matching gps and credit card spending observations

As I’ve mentioned before, it is very likely to see the gps observations that occurs after a > 3mins time interval to show up at the location where a credit card purchasing just happened becuase people generally leave a place very soon after pay off their bills.

Following this logic, I used the line paths graph and dot graphs above to find out the moving pattern in a particular day or a particular time, and then match it to a credit card holder that who shares the very similar spending pattern. Matching result is presented below:

match_result <- read_csv("MC2/total_match.csv")

match_result
# A tibble: 44 x 6
   CC_number Loyalty_number Car_ID Name           CurrentEmploymentTy~
       <dbl> <chr>           <dbl> <chr>          <chr>               
 1      9551 L5777               1 Nils Calixto   Information Technol~
 2      1415 L7783               2 Lars Azada     Engineering         
 3      9635 L3191               3 Felix Balas    Engineering         
 4      7688 L4164               4 Ingrid Barran~ Executive           
 5      6899 L6267               5 Isak Baza      Information Technol~
 6      7253 L1682               6 Linnea Bergen  Information Technol~
 7      2540 L5947               7 Elsa Orilla    Engineering         
 8      1877 L3014               9 Gustav Cazar   Engineering         
 9      1311 L4149              11 Axel Calzas    Engineering         
10      7108 L6544              12 Hideki Cocina~ Security            
# ... with 34 more rows, and 1 more variable:
#   CurrentEmploymentTitle <chr>

Question 4

Given the data sources provided, identify potential informal or unofficial relationships among GASTech personnel. Provide evidence for these relationships. Please limit your response to 8 images and 500 words.

Plot path plot

gps_path_selected <- gps_path2 %>%
  filter( id %in% c(7,33))
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col = 'id', style = "fixed")

From the line chart above, we can see a strong similarity between CarId 7 and 33. These two people always showed up on the same road and end at the same destination.

car2 <- car %>%
  filter(CarID %in% c(33, 7))

car2
# A tibble: 2 x 5
  LastName  FirstName CarID CurrentEmploymentTy~ CurrentEmploymentTit~
  <chr>     <chr>     <fct> <chr>                <chr>                
1 Orilla    Elsa      7     Engineering          Drill Technician     
2 Tempestad Brand     33    Engineering          Drill Technician     

CarId 7 is Elsa Orilla. She is an engineering as a drill technician. CarId 33 is Brand Tempestad. He is also an engineering as a drill technician.

According to the graph plotted, Elsa and Brand ate lunch together at Ouzeri Elian at 13:22 on January 6th. They also showed up quite frequently at the Chostus Hotel. They are there at 12:56 on Jan 8th and 13:17 on Jan 14th. All these movements prove to there is an intimate relationship between Elsa and Orilla.

gps_path_selected <- gps_path2 %>%
  filter( id %in% c(22,30, 15))
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col = 'id', style = "fixed")
car3 <- car %>%
  filter(CarID %in% c(22, 30, 15))

car3
# A tibble: 3 x 5
  LastName FirstName CarID CurrentEmploymentType CurrentEmploymentTit~
  <chr>    <chr>     <fct> <chr>                 <chr>                
1 Bodrogi  Loreto    15    Security              Site Control         
2 Nubarron Adra      22    Security              Badging Office       
3 Resumir  Felix     30    Security              Security Group Manag~

According to the line graph ploted, Loreto, Adra and Felix drink coffee at Brew’ve Been Served very frequently. They generally meet around 8:10 am. Habits like this indicates there is a close relationship between these 3 staffs.

Question 5

Do you see evidence of suspicious activity?

Draw line graph of suspicious paths in deep night:

gps_path_selected <- gps_path2 %>%
  filter(hour %in% c("02", "03", "04", "05"))
tmap_mode("view")
tm_shape(bgmap) +
  tm_rgb(bgmap, r = 1,g = 2,b = 3,
       alpha = NA,
       saturation = 1,
       interpolate = TRUE,
       max.value = 255) +
  tm_shape(gps_path_selected) +
  tm_lines(col = 'id', style = "fixed")

According to this graph, we can see suspicious paths of people with CarID 24, 21, 15, 16 in deep night times.

car4 <- car %>%
  filter(CarID %in% c(24, 21, 15, 16))

car4
# A tibble: 4 x 5
  LastName FirstName CarID CurrentEmploymentType CurrentEmploymentTit~
  <chr>    <chr>     <fct> <chr>                 <chr>                
1 Bodrogi  Loreto    15    Security              Site Control         
2 Vann     Isia      16    Security              Perimeter Control    
3 Osvaldo  Hennie    21    Security              Perimeter Control    
4 Mies     Minke     24    Security              Perimeter Control    

All these 4 people are security in GasTech. This Employment Type does provide them advantages if they has hidden plan involved in this kidnap case. Traveling path of these 4 also very similar on the graph. Because of all these above reasons, places they have stopped by in the deep night before they met are the Top suspicious locations that worth police’s attention.

These locations are: Brew’ve Been Served where they always meet up in the morning. Frydo’s Autosupploy N’ More, where seem to be their meeting palce in the deep night. Chostus Hotel where they stopped by. Taxiarchon Park where they stopped by when they on their way to meet in the deep night. Ahaggo Museum where Minke, Hennie, and Isia always passed by before (or after) their meeting in the deep night. Spetson Park where Loreto always passed by before (or after) their meeting in the deep night.